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ABSTRACT 



This brief presents the case for the reform of assessment 
systems as an essential component of systemic reform. The scope includes ways 
to share conceptual understanding among all stakeholders in the educational 
system, ways to evaluate the impact of systemic reforms in terms of student 
attainment of relevant goals, and ideas on how assessment can play a major 
role in guiding systemic reform. The formidable body of work done to develop 
new assessment systems is described. The ideas contained in this brief are 
generic and equally applicable to science education. (Author/DDR) 
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If one is to change 
classroom activities , 
it is important to 
align curriculum 
ambitions and 
assessment practices. 



by James Ridgway 





E ducational reforms are likely to fail unless 
new forms of assessment are implemented 
that reflect new standards. Assessment 
systems play a key role in education: They 
provide the rewards for students, teachers and 
schools and have a major effect on what is taught 
and how it is taught. Appropriate assessment 
schemes can be powerful levers to support 
reform; assessment schemes that do not reflect 
new educational ambitions, however, are barriers 
to progress. 
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This Brief sets out the case for the reform of 
assessment systems as an essential component of 
systemic reform. The scope of this Brief includes 
ways to share conceptual understanding among all 
the stakeholders in the educational system, ways to 
evaluate the impact of systemic reforms in terms of 
student attainment of relevant goals, and ideas on 
how assessment can play a major role in guiding 
systemic reform. A great deal of work to develop 
new assessment systems has already been done in 
mathematics, some of which is described here. The 
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ideas set out in the Brief are generic and 
apply equally well to science education. 

Why Do We Need to Revise 
Assessment Systems? 

Public assessments establish the merit and 
competence of students, teachers and 
schools. Students and teachers are reward- 
ed for good performance on the assessment 
system that is in place, no matter what its 
nature. Assessment, therefore, will drive 
(or constrain) educational activities 
(Messick, 1994; Ridgway & Passey, 1993). 
If one is to change classroom activities, it is 
important to align curriculum ambitions 
and assessment practices. New curriculum 
goals-such as making connections between 
mathematical topics and applying mathe- 
matics to situations-provide pressure to 
include less work in the curriculum on 
learning topics in isolation or on memoriz- 
ing procedures and formulas without 
understanding. 

However, since almost all “high stakes” 
assessments-such as state tests or SATs- 
assess mathematical technique on a narrow 
range of tasks, a serious conflict exists 
between new educational goals and current 
assessment practices. Most current high 
stakes assessment methods present a major 
barrier to educational reform. If reform 
goals are to be promoted, tests that mea- 
sure decontextualized technical skills need 
to be replaced with tests that reflect the 
new intellectual agendas being initiated by 
professional societies, states and districts. 
As an issue of policy, the implementation 
of standards-based curricula should always 
be accompanied by the implementation of 
standards-based assessment. 

A powerful lever for reform would be 
relevant feedback about the progress of 
Systemic Initiatives (Sis) and of new cur- 
ricula. If an SI or a curriculum project sets 
out to promote fluent use of algebra in 
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Feedback about what is 
effective — and what is not 
effective — in promoting 
new educational goals is 
essential to progress. 



realistic contexts, for example, there is little 
point assessing its success via a test of 
speeded arithmetic. However, even the 
tasks that comprise the National 
Assessment of Educational Progress (e.g., 
NCES, 1994) or the Third International 
Mathematics and Science Study (TIMSS) 
(e.g., Valverde & Schmidt) fall far short of 
what is required for the assessment of new 
curriculum goals. 

The assessment vacuum poses a serious 
challenge to educational reform, because 
feedback about what is, and what is not, 
effective in promoting new educational 
goals is essential to progress. Establishing 
systems to assess educational progress will 
mean devoting considerable effort to the 
management and reporting of informa- 
tion, both formally and informally. 



What Is 'Balanced 
Assessment'? 

Balanced assessment doesn’t focus on a sin- 
gle theme, such as “technical skill” or 
“authentic performance,” nor does it use a 
single method of assessment, such as mul- 
tiple-choice tests or portfolios. This simple 
starting point opens up a debate at the 
heart of educational reform about the sorts 
of mathematics that students should 
acquire, and about how different aspects of 
performance should be recognized and 
rewarded. If assessment systems are to pro- 
mote reform, they should be designed to 
exemplify the new curriculum goals. That 
is, the balance of assessment tasks should 
mirror the balance of the new curriculum. 

One can identify a number of dimen- 
sions on which mathematics tasks differ, 
such as mathematical content (e.g., num- 
ber, algebra, geometry); task type (e.g., 
technical exercise, nonroutine problem, 
creating a plan); or the circumstances of 
performance (e.g., multiple-choice item, 
60-minute open-response task). Every task 
can be located within the space defined by 
these dimensions. Assessment is “bal- 
anced” if the assembly of tasks used to 
assess student performance samples each 
dimension in an appropriate manner. 

Of course, the term “appropriate man- 
ner” brings with it another set of concep- 
tual problems! There can be no unique 
description of individual tasks (for exam- 
ple, see the Skeleton Tower exercise 
described on page 5), nor of the domain of 
mathematics. Even if there were, it would 
not remain in place for long, as new 
branches of mathematics are invented, and 
as different areas of mathematics rise and 
fall in terms of their perceived relevance to 
school mathematics. Nevertheless, classifi- 
cations are useful, because they draw atten- 
tion to important aspects of mathematics 
that might otherwise be ignored. A num- 
ber of classifications are available-from the 



National Council of Teachers of 
Mathematics, from the New Standards 
Project, from state and systemic initiatives, 
and from national curriculum framework 
documents around the world. 

The National Science Foundation 
funded the Balanced Assessment project, 
based at the University of California at 
Berkeley, Harvard University, Michigan 
State University, and two uni verisi ties in 
England, the University of Nottingham 
and the University of Lancaster. The proj- 
ect was aimed at producing ways to assess 
new curriculum goals. (Ridgway & 
Schoenfeld, 1994, details the project’s 
rationale.) The “Dimensions of Balance” 
shown here are adapted from this project. 

Dimensions of Balance 

Mathematical Content will include some 
of the following: 

Number and Quantity \ including con- 
cepts and representation; computation, 
estimation and measurement; number 
theory and general number properties. 

Algebra, Patterns and Function, includ- 
ing patterns and generalization; func- 
tional relationships (including ratio and 
proportion); graphical and tabular rep- 
resentation; symbolic representation; 
forming and solving relationships. 

Geometry, Space and Shape. 

Handling Data , Statistics and 
Probability. 

Other Mathematics. 

Mathematical Process, such as problem 
solving, reasoning and communication, 
will include some of the following: 
Modeling and Formulating. 

Transforming and Manipulating. 
Inferring and Drawing Conclusions. 
Checking and Evaluating. 

Reporting. 




The design and redesign of assessment systems that are aligned to reform initiatives can act as a 
“driver” of reform. 



Task Type will be one of the following: 
Open Investigation. 

Nonroutine Problem. 

Design. 

Plan. 

Evaluation and Recommendation. 

Review and Critique. 

Re-p resen ta tion of In fo rma tio n . 

Technical Exercise. 

Definition of Concepts. 

Goal Type will be one of the following: 
Pure Mathematics. 

Illustrative Application of Mathematics. 

Applied Power over a Practical 
Situation. 

Circumstances of Performance will 
include: 

Task Length. 

Modes of Presentation including, 
written, oral, video, computer. 

Modes of Working including 
individual, group, mixed. 

Modes of Student Response, including 
written, built, spoken, programmed, 
performed. 

Classification systems are not neutral 
descriptions. Rather, they are laden with 
beliefs about the nature of mathematics. 
An assessment system, through its tasks 
and scoring schemes, defines the nature of 
mathematics for its adoptive community. 
New frameworks and new assessment sys- 
tems achieve wide acceptance only after a 
great deal of debate: They force an articu- 
lation of what is mathematically valuable, 
and indeed about the nature of mathemat- 
ics. Vigorous debate may well be a good 
thing from the viewpoint of systemic 
reform, for it certainly shows that proposed 
changes are not just “business as usual, but 
with new badges.” 
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Developing New Tasks 

New tasks require extensive supporting 
materials, such as: 

A clear definition of the core mathe- 
matics. There is strong evidence that 
nonexperts judge tasks in terms of their 
surface featu res-“Its about playing 
with Lego,” or “Its about drawing on 
T-shirts”-rather than in terms of their 
deep structures: generalization, proof, 
algebra, mathematical notation and 
communication. It is important to 
communicate the core mathematics to 
parents, to students and (sometimes) 
to teachers. 

Administrative details . These details 
provide users with a way of quickly 

. 4 



identifying candidate tasks. They 
include the student grade level for 
which it is designed; the mathematical 
background that students need in order 
to tackle the task; the length of time the 
task will take to administer; any materi- 
als that teachers need to assemble 
before they start; and the way students 
are to be grouped in order to perform 
on the task. 

Examples of student work. These are 
essential to demonstrate the mathemat- 
ics that students can produce and to 
show a variety of levels of performance. 

A variety of scoring schemes. Some users 
(such as states and Sis) choose to 
employ holistic scoring schemes while 
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others prefer analytic schemes. It is 
important to respect these choices. 
Devising scoring schemes is a difficult 
process, which requires detailed analysis 
of student scripts and a good deal of 
thought. Users welcome exemplars of 
scoring schemes that fit their current 
practices. The guiding principle for the 
development of any scoring scheme is 
that procedures are in place to ensure 
that the scheme meets acceptable stan- 
dards of reliability and validity. 
Evidence from studies in Vermont, for 
example, shows clearly that process 
skills can be assessed reliably, as do 
studies investigating the Connected 
Mathematics Project, which is 
described below. Issues surrounding the 
psychometrics of performance assess- 
ment are discussed in Phillips (1996). 

A “framework for balance” to guide users 
who want to assemble assessment instru- 
ments. This is an essential tool to help 
shape the local vision of mathematics 
and to plan the evolution of this vision 
over time. 

What Do Tasks Assess? 

Skeleton Tower, shown here, is an exercise 
designed for use in Grades 10 and 11. 
It presents a pattern and then calls for a 
generalization of the pattern, as well as 
explanation (hopefully a proof) of why the 
generalization holds true. Generalization, 
explanation and proof are deep mathemat- 
ical ideas that give power over mathemati- 
cal situations. 

What does Skeleton Tower assess? That 
depends on how the student approaches 
the task. Every student who completes the 
task successfully must show abilities to 
generalize and to prove and to explain 
mathematical ideas. However, there are a 
number of different routes to success. 







Skeleton Tower 




1 . How many cubes are needed to build 
this tower? 

2. How many cubes are needed to build a 
tower like this, but 12 cubes high? 

3. Explain how you worked out your 
answer to part 2. 

4. How would you calculate the number 
of cubes needed for a tower n cubes 
high? 



Some students answer the problem by 
considering mathematical series: 

• They add up the blocks in each 
“wing” by counting (1+2+3+....), 
then multiply by 4 and add on the 
central column. 

• Or they consider the tower as a set 
of horizontal slices, and count the 
blocks in each layer (1+5+9+....). 

Some students answer the problem by 
rearranging the blocks in the Tower: 

•They imagine tearing off two 
opposite “wings,” which they turn 
upside down and stick onto the 
remaining structure to make a 
“wall.” 

•Or they imagine tearing the struc- 
ture into four pieces and making a 
rectangle and a square with these 
pieces. 




Students who use sums of series are 
showing their prowess in pure mathemat- 
ics. Those who rearrange the structure, on 
the other hand, are showing their spatial 
skills. For students facing a task like this for 
the first time, solving the Skeleton Tower 
might involve a great deal of mathematical 
creativity. For students who have been 
exposed to a more open curriculum, the 
task may instead require the exercise of 
their process skills. And for still other stu- 
dents, Skeleton Tower may even be a mem- 
ory task. Despite these differences, the 
mathematical demand of the task is unam- 
biguous. Better answers can be distin- 
guished from weaker ones, and student 
responses can be scored reliably. 

Key Roles for Assessment in 
Education 

Vivid Communication of New Goals 

Tasks can provide clear illustrations of the 
educational goals of a reformed curricu- 
lum, and tasks with student work are often 
included in. state framework documents 
and in documents for teachers, parents and 
students. Mathematics, however, is not a 
spectator sport: Vivid communication 
requires more than showing things for 
stakeholders to look at. It requires intellec- 
tual engagement with new ideas, such as 
working on tasks and scoring student 
work. An activity- based approach to com- 
municating the goals of reform has been 
found to be successful when working with 
groups as diverse as mentor teachers, teach- 
ers, parents and community members. 

The Balanced Assessment project 
includes the following activities. Sessions 
start with groups solving a problem and 
then sharing their solutions and the math- 
ematical processes they used. This intro- 
duction gives everyone a good grasp of the 
mathematics contained in the task, and 
exposure to a range of solutions and expla- 
nations. Next, participants are given stu- 
dent scripts chosen to illustrate a range of 







levels of performance. Individuals rank the 
scripts, and these ranks form the basis for 
discussions about the aspects of mathemat- 
ics performance that are seen to be of 
value. After discussions, the next stage is to 
devise scoring schemes that use different 
methods, such as holistic judgement or 
allocating points for different parts. 
Working on scoring schemes and student 
scripts provides a sharp focus for discus- 
sions about the core mathematics and how 
student knowledge can be recognized and 
rewarded, which can lead participants to a 
deeper understanding of the educational 
ambitions underlying the reforms. 

Customized Evaluation of Systemic 
Initiatives 

The National Science Foundation empha- 
sizes the need to reform assessment prac- 
tices alongside changes in curricula and 
instructional practice. However, reform of 
state assessment mechanisms is often 
beyond the immediate reach of Sis (espe- 
cially urban or rural Sis), and so most Sis 
are set in the context of traditional assess- 
ment systems. One approach to the design 
of assessment systems is to take account of 
existing tests (notably those mandated at 
state level) and to complement them with 
tasks designed to make up a fuller picture 
of students’ mathematical competencies. 
This approach is being pursued in a pilot 
study in El Paso, Texas. Scores from the 
existing state test (the Texas Assessment of 
Academic Skills, or TAAS) will provide 
measures of mathematical technique; items 
from TIMSS and NAEP will provide a 
broader range of items for which national 
and international scores are available; and 
tasks from the Balanced Assessment task 
collection will provide a range of perfor- 
mance assessments. 



Evaluating New Curricula 
A broad and deep array of evidence is need- 
ed about what is and is not effective cur- 
riculum practice. However, attempts to 
assess the effectiveness of a new curriculum 
face a paradox: If traditional tests are used, 
they fail to measure the mathematical skills 
that the new curriculum is trying to teach; if 
tests are based on the curriculum materials, 
then participating students can hardly fail to 
outperform their rivals in the control group. 

An approach that resolves this paradox 
has been used to evaluate the Connected 
Mathematics Project (CMP), which is 
funded by NSF to support middle school 
mathematics. An assessment scheme was 
produced that reflected broad NCTM 
ambitions for the . middle school, rather 
than CMP ambitions themselves. Five par- 
allel tests were assembled from the 
Balanced Assessment task bank for grades 
6, 7 and 8, designed to assess student per- 
formance on tasks representing NCTM 
standards at those grade levels. The Iowa 
Tests of Basic Skills were used to assess 
technical skill. 

Results showed that one year into the 
program, students following the CMP cur- 
riculum improved more than the control 



An assessment system , through its tasks and 
scoring schemes , defines the nature of math- 
ematics for its adoptive community : 

classes on the BA tasks, and there were 
no clear differences on the ITBS tests 
(Zawojewski, Hoover, & Ridgway, 1997). 
In the second and third years of the pro- 
gram, detailed analyses showed significant 
gains in ITBS scores for CMP students, 
compared with students in control classes. 

The Design and Redesign of 
Assessment as a Driver of 
Systemic Reform 

Lack of alignment between educational 
ambition and the assessment system will 
be a major hindrance to the reform 
process (Webb, 1997). Conversely, the 
design and redesign of assessment systems 
that are aligned to reform initiatives can 
act as a “driver” of reform. The revision of 
assessment methods over a period of time 
is likely to cause far less of a shock to an 
educational system than would the intro- 
duction of assessment instruments that 
match all the new educational goals at the 
outset. A policy of incremental change in 
assessment systems can promote change in 
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ways that allow professional development activities and 
curriculum development to keep up (Chrispeels, 
1997). Collections of carefully validated tasks can 
allow assessment systems to be assembled that incor- 
porate assessment practices already in place-such as 
state mandated tests or school-based portfolio work-so 
as to reflect short term goals on “balance.” The risk of 
political backlash and large-scale resistance from edu- 
cators, which might well overwhelm the efforts at 
reform, can be assessed and used to judge an acceptable 
pace for reform. The planned pace of reform is reflect- 
ed (and made public) in the evolution of high-stakes 
assessment systems. 

Conclusions 

Exclusive dependence on standardized tests of tech- 
nique in mathematics and science poses a substantial 
threat to educational reform. More balanced approach- 
es to assessment have been developed and can be 
tailored to fit local ambitions and circumstances. 
Carefully constructed new styles of task are essential for 
communicating the intellectual heartland of new 
reforms to stakeholders and for monitoring — and 
sometimes for steering — the progress of reform. 
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Resources 

TIMSS release items (about two-thirds of all the tasks they 
used) can be downloaded from: http://wiuiu.csteep.bc.edu/timss 

Information on the NAEP 1996 Report Card can be 
obtained from the National Library of Education, Office of 
Educational Research and Improvement, U.S. Department 
of Education, 555 New Jersey Avenue, NW, Washington, 

DC 20208-5721 

The NAEP Web site is http://www.ed.gov/NCES/naep 

The NCTM Web site is http://www.nctm.org 

Resources from the Balanced Assessment project, and 
consultancy on a range of issues related to the design of 
assessment methods, can be obtained from the NSF-funded 
Mathematics Assessment Resource Service at: 
http:/ / www. educ. msu. edu/MARS 

Shell Center. (1984). Problems with patterns and numbers. 
Manchester, England: Shell Center/JMB 
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